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Bayesian inference provides a flexible way of combining data with 
prior information. However, quantile regression is not equipped with 
a parametric likelihood, and therefore, Bayesian inference for quan- 
tile regression demands careful investigation. This paper considers the 
Bayesian empirical likelihood approach to quantile regression. Tak- 
ing the empirical likelihood into a Bayesian framework, we show that 
the resultant posterior from any fixed prior is asymptotically nor- 
mal; its mean shrinks toward the true parameter values, and its vari- 
ance approaches that of the maximum empirical likelihood estimator. 
A more interesting case can be made for the Bayesian empirical likeli- 
hood when informative priors are used to explore commonality across 
quantiles. Regression quantiles that are computed separately at each 
percentile level tend to be highly variable in the data sparse areas 
(e.g., high or low percentile levels). Through empirical likelihood, the 
proposed method enables us to explore various forms of commonality 
across quantiles for efficiency gains. By using an MCMC algorithm in 
the computation, we avoid the daunting task of directly maximizing 
empirical likelihood. The finite sample performance of the proposed 
method is investigated empirically, where substantial efficiency gains 
are demonstrated with informative priors on common features across 
several percentile levels. A theoretical framework of shrinking priors 
is used in the paper to better understand the power of the proposed 
method. 

1. Introduction. Quantile regression is a statistical methodology for the 
modeling and inference of conditional quantile functions. Following Koenker 
and Bassett (1978), we specify the rth conditional quantile function of Y 6 M. 
given X £ W +1 as 

(1.1) Q t (Y\X) = X t I3(t), 
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where r € (0, 1), and /3(r) typically includes an intercept. Quantile modeling 
of this type can be estimated for one or several percentile levels; we refer 
the details on computation and basic asymptotic theory to Koenker (2005). 
Inferential methods for quantile regression have been developed by a num- 
ber of researchers, including Gutenbrunner and Jureckova (1992), Horowitz 
(1998), Chen et al. (2008) and Kocherginsky, He and Mu (2005). The r- 
specific models allow for great flexibility, as /3(r) for upper or lower quantiles 
can be distinct from central trends, but the quantile estimates are highly 
variable in data-sparse areas. Taking advantage of some commonality in the 
quantile coefficients /3(r) across r can provide a desirable balance in the 
bias-variance tradeoff. In this article, we consider using prior information 
on (3(t) across several r values. For example, a common slope assumption 
for r near 1 can improve the efficiency of high quantile estimation. Other 
forms of informative priors on /3(r) may achieve a similar goal. Bayesian 
methods are a natural way of combining data with prior information. The 
main difficulty in putting the Bayesian method to work for quantile regres- 
sion is that the model on Q T (Y\X) for one or any small number of r values 
does not specify a parametric likelihood, which is needed in the Bayesian 
framework. 

Several authors have attempted to use a working likelihood in the Bayesian 
quantile regression framework. Kottas and Gelfand (2001) and Kottas and 
Krnjajic (2009) used Dirichlet process mixture models. Reich, Bondell and 
Wang (2008) assumed the error distributions to be an infinite mixture of 
normals. Dunson and Taylor (2005) used an approximate method based on 
the Jefferey's substitution likelihood for quantiles. Yu and Moyeed (2001), 
Geraci and Bottai (2007) and Yue and Rue (2011), among others, chose 
(asymmetric) Laplace distributions as the working likelihood. Those ap- 
proaches, mostly tailored toward a specific percentile level of r, use Markov 
chain Monte Carlo algorithms as a useful means of computation. Work of 
these authors provided numerical evidence that a Bayesian approach to 
quantile regression has merits. 

In this article, we focus on estimating several quantiles together. To do so, 
we use the empirical likelihood (EL), introduced by Owen (1988), to incor- 
porate quantile regression into a (pseudo-) Bayesian framework. Empirical 
likelihood makes it easy to model several quantiles at the same time, allow- 
ing informative priors on /3(r) across r to be utilized. Statistical inference 
based on empirical likelihood is known to enjoy good asymptotic proper- 
ties, especially if the EL is associated with moment restrictions of sufficient 
smoothness. Molanes Lopez, Van Keilegom and Veraverbeke (2009) consid- 
ered the EL with nonsmooth estimating equations under a general setting. 
A more comprehensive review about empirical likelihood can be found in 
Owen (2001) and Chen and Van Keilegom (2009). Since the moment re- 
strictions for quantiles are placed on nonsmooth functions, some researchers, 
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including Chen and Hall (1993), Whang (2006) and Otsu (2008) proposed 
using smoothed versions of the quantile estimating equations. The smoothed 
EL is further extended to weakly dependent processes in Chen and Wong 
(2009) and censored data in Ren (2008). We choose to focus on the ex- 
act moment conditions for quantiles without the complication of choosing 
a smoothing parameter. Those moment conditions are also used in Wang 
and Zhu (2011) and Kim and Yang (2011) for clustered data. In addition, 
we use a standard MCMC algorithm to explore the posterior, to avoid the 
daunting task of directly maximizing the empirical likelihood. In fact, the 
EL function given any proposed parameters is relatively easy to compute, 
even though the EL-maximization is notoriously difficult, even in modest 
dimensions. 

The empirical likelihood is not a likelihood in the usual sense, so the valid- 
ity of the resultant posterior does not follow automatically from the Bayes 
formula. Lazar (2003) discussed the validity of inference for the Bayesian 
empirical likelihood (BEL) approach based on earlier work of Monahan and 
Boos (1992). Schennach (2005) and Lancaster and Jun (2010) considered 
Bayesian exponentially tilted empirical likelihood (ETEL), which can be 
viewed as a nonparametric Bayesian procedure with noninformative pri- 
ors on the space of distributions. Lancaster and Jun (2010) further con- 
sidered Bayesian ETEL in quantile regression. For the inference of popula- 
tion means, Fang and Mukerjee (2006) investigated the asymptotic validity 
and accuracy of the Bayesian credible regions, and furthermore, Chang and 
Mukerjee (2008) showed that EL admits posterior based inference with the 
frequentist asymptotic validity, but many of its variants do not enjoy this 
property. In this article, we establish the asymptotic distributions of the 
posterior from the BEL approach for quantile regression, which enable us to 
evaluate efficiency gains from informative priors. Chernozhukov and Hong 
(2003) discussed the asymptotic properties of the quasi-posterior distribu- 
tions defined as transformations of general statistical criterion functions. In 
our work, we establish the asymptotic distributions of the posterior from the 
BEL approach for quantile regression, and are particularly interested in the 
interaction of informative priors and empirical likelihood on the asymptotic 
distribution of the posterior, which enables us to evaluate efficiency gains 
from informative priors. 

Ideas similar to BEL have been used by other researchers. Yin (2009) 
proposed the Bayesian generalized method of moments (GMM), which can 
be adapted to quantile estimation. Hahn (1997) considered Bayesian boot- 
strap in quantile regression. Note that the GMM estimators are also defined 
through moment restrictions, which allow them to model multiple quantiles 
jointly. The GMM estimators, the maximum empirical likelihood estimators 
(MELE) and some other EL-type estimators generally have the same asymp- 
totic distributions, but possibly different higher order asymptotic properties; 
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see Newey and Smith (2004) and Schennach (2007). As discussed in Newey 
and Smith (2004), the empirical likelihood approach has advantages over 
the GMM estimators. Unlike GMM, the (asymptotic) bias of the MELE 
does not grow with the number of moment restrictions. Furthermore, the 
efficiency of the GMM estimator relies on a covariance matrix estimate for 
the estimating equations, which could be ill-conditioned when estimating 
multiple quantiles. 

The recent development of Bayesian (conditional) density estimation us- 
ing mixture models enables nonparametric regression models on all quan- 
tiles simultaneously; see Miiller, Erkanli and West (1996), Mxiller and Quin- 
tana (2004), Dunson, Pillai and Park (2007) and Chung and Dunson (2009), 
among others. Theoretical results about posterior consistency can be found 
in Pati, Dunson and Tokdary (2010), Norets and Pelenis (2010) and the 
references therein. In contrast, our proposed BEL approach targets a small 
number of selected quantiles without the need to model the entire conditional 
distributions. A novel part of our work is its ability of employing informative 
priors to explore commonality across quantiles for efficiency gains. 

The rest of the paper is organized as follows. In Section 2, we introduce the 
proposed BEL approach for quantile regression, and discuss model assump- 
tions, method of computation and use of informative priors. The asymptotic 
properties on the BEL posteriors are provided in Section 3 for both fixed 
and a class of shrinking priors. The theoretical framework of shrinking pri- 
ors enables us to understand the efficiency gains of the BEL approach over 
traditional methods. Section 4 demonstrates the finite sample performance 
of the BEL approach through Monte Carlo simulations with a focus on 
frequentist properties of BEL posterior intervals, and efficiency gains from 
informative priors. In Section 5, we use a real data example to show that 
the BEL approach can be used as a useful statistical downscaling method 
for the projection of high quantiles of temperature from large scale climate 
models to a local scale. Some concluding remarks are given in Section 6. The 
technical details to support the theorems in Section 3 are provided in the 
Appendix. 

2. Bayesian empirical likelihood for quantile regression. In this section 
we introduce the Bayesian empirical likelihood approach for quantile regres- 
sion. We begin with notation and definitions of the underlying models and 
moment restrictions. Let D = {(Xi,Yi),i = l,...,n} be a random sample 
from the following quantile regression model: 



where X € W +1 is composed of an intercept term and p covariates. We as- 
sume that the distribution of the p covariates, Gx, has a bounded support X . 
If the design points are nonstochastic, the basic conclusions we obtain in 
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this paper hold under appropriate conditions on the design sequence, but 
we focus on the case of random designs for simplicity. The unknown func- 
tion Po(t), if specified over all r € (0,1), describes the entire conditional 
distribution of Y given X, which is denoted as Fx in the rest of the paper. 
We consider the problem of estimating k quantiles at t\ < T2 < • • • < , and 
let Co = (/3o( r i)> • • • j A)( r fc)) b e the true parameter of interest in R fc ( p+1 ). In 
most applications, k is a small integer. To estimate (0, we use k(p + 1) di- 
mensional estimating functions m(X, Y, £), where £ = (/3(ti), . . . , /3(rfc)) and 
the components of m are 

(2.2) m dk+j (X,Y,C) =ifj Td+1 (Y-X T p(r d+1 ))X j 

for d = 0, 1, . . . , k — 1, j = 0, 1, . . . ,p, with 



i> T (u) 



l{u<o} - T > uj^O, 
0, u = 



being the quantile score function, where lr^i is an indicator function on 
the set A. We hasten to add that £ may contain fewer than k(p + 1) un- 
known parameters when some common parameters are present in /3(r) at 
different quantile levels. In such cases, the number of moment restrictions 
exceeds the number of unknown parameters. As shown in Qin and Lawless 
(1994) for smooth estimating functions, the maximum empirical likelihood 
estimator attains the optimal asymptotic efficiency subject to those moment 
conditions. We expect the same for quantile estimating functions. 
For any proposed C, its profile empirical likelihood ratio is given by 



(2.3) tt(C)=max^IJ(nwi) ^c^m^, Y h () = 0, Ui > 0, = 1 \. 

U=l i=l i=l J 

By a standard Lagrange multiplier argument, we have 

n 

tt(o=nwc)}, 

i=l 

where the weights = [n{l + X n (C) T m(Xi, Yi, C)}] _1 5 an d the Lagrange 
multiplier X n (C) satisfies the following equation: 

E m(Xi,Yj,C) = 

As discussed in Chen, Sitter and Wu (2002) and Qin and Lawless (1994), 
the existence and uniqueness of X n (C) are guaranteed when the following 
two conditions are satisfied: 

(CI) The vector G is within the convex hull of {m(X i ,F i ,C), 

i = l,...,n}. ^ 

(C2) The matrix ^2™ =l {m(Xi,Yi,Qm(Xi,YiX) T } is positive definite. 
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The first condition (CI) actually provides a feasible region of £ supported by 
the observations D, in which the proposed £ has a valid empirical likelihood 
value. If Yi < Xj !3{T d ) at some Td for all i = 1, . . . ,n, this proposed £ will 
violate the first condition, and then we regard its empirical likelihood value 
as 0. The second condition (C2) requires the set of estimating functions to 
be linearly independent. Noting that 

E{m(X, Y, Co)m(X, Y, ( ) T } = ¥ ® E(XX T ), 

where the elements of the matrix are VPy = t% A tj — TiTj, the second 
condition is generally satisfied for £ near (q, as long as E(XX T ) is positive 
definite. 

For any proposed £, consider its empirical likelihood function lZ{C,)/n n = 
nr=i W «(C)- With a prior specification po(C) on the parameter £, we can 
formally have the posterior density 

(2.4) p(C|D)ocp (C) xft(C)- 

We call p(£|.D) the posterior distribution from the BEL approach. This can 
be viewed as a misnomer, chosen for the sake of convenience, because it is not 
really a posterior in the strict sense. Lazar (2003) proposed a procedure to 
check whether the empirical likelihood is valid for posterior inference based 
on the criteria provided in Monahan and Boos (1992). In this paper, we 
focus on the asymptotic properties of the posterior distribution (2.4), and 
establish its frequentist validity by first-order asymptotics. 

Finding the maximum empirical likelihood estimator is a daunting task 
computationally, because the objective function is generally multi-modal. 
However, the value of the empirical likelihood ratio is relatively easy to 
compute given £, which makes the Metropolis-Hastings algorithm, as given 
in Hastings (1970), feasible for sampling from the posterior. By choosing 
a proper prior, the posterior in (2.4) is also proper. Therefore, by checking 
the detailed balance equation and Theorem 4.2 in Gilks, Richardson and 
Spiegelhalter (1996), the distribution of the MCMC sampler converges to 
the posterior in (2.4). More discussions on computation efficiency can be 
referred to Chernozhukov and Hong (2003). A Bayesian framework has its 
own merits in applications where informative priors on /3(r) might be more 
appropriate than a strict functional relationship on some of the parameters. 
For example, we may believe that the slopes in /3(ti) are roughly the same 
as in P(t2)- Imposing strict equalities to reduce the number of unknown 
parameters in £ might be hard to justify, but an informative prior on the 
difference of two neighboring j3(r) can help regularize quantile estimation. 

By using a standard Metropolis-Hastings algorithm for a given prior po(C), 
we may use the average of the Markov chain on £ as an estimate of £, when 
the posterior looks close to normal; otherwise, we suggest using the mode 
of the posterior, which maximizes (2.4). In the empirical investigations in 
Sections 4 and 5, we use the posterior mode as the estimates. 
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In our empirical investigations, we have found that the posterior mode of 
the slope parameters behaves well, but the intercept parameter in each /3(r) 
can be better estimated in small samples if the following strategy is fol- 
lowed. Suppose that j3(r) = (/3/(r), /%(t)), where /3/(r) corresponds to the 
intercept, and f3s( T ) corresponds to the slope. Let /3s (t) be the posterior 
mode/mean obtained from the MCMC chain, we use the modified esti- 
mate $i(t) as the rth sample quantile of Y\ — Xjj/^r), where X$i cor- 
responds to Xi excluding the intercept term. This modification does not 
alter the asymptotic distributions of the /3(r). In the rest of the paper, we 
always use this modification in the BEL estimate of quantile regression. 

3. Asymptotic properties of BEL. In this section, we provide an asymp- 
totic justification of the BEL estimator for quantile regression by deriving 
the limiting behavior of the posterior distribution as n — > oo. One noticeable 
point about the estimating equations (2.2) is that they involve indicator 
functions, so the resulting empirical likelihood ratio is nonsmooth in C- An 
asymptotic normality of the posterior distribution in the Bayesian empirical 
likelihood context was derived heuristically in Lazar (2003) for smooth esti- 
mating equations. We rely on empirical process theory to establish a similar 
result for the BEL here. 

As the first step, we shall prove the consistency of the maximum em- 
pirical likelihood estimator (MELE), which is a necessary condition for the 
asymptotic normality of the posterior. 

3.1. Consistency of the MELE. We assume that the true parameter Co 
falls into a compact set of the parameter space, and the optimization is 
carried out over this compact set. For notational convenience, let 

C = argmax{7£(£)} 

be the MELE, whose dependence on n and the compact set on Q have been 
suppressed in our notation. Note that the maximum empirical likelihood 
estimate might not be unique, but the result here applies to any maximizer 
of the empirical likelihood ratio, and all the maximizers converge to the 
same asymptotic value. 

The estimating functions m(X, Y, C) are not smooth in but it is worth 
noting that the expectations of m(X, Y, C) and the empirical likelihood func- 
tion are sufficiently smooth under the following assumptions. 

Assumption 3.1. There exists a neighborhood M of Co such that 
P (11(C) > 0) -> 1 for any ( G A/", as n oo. 

Assumption 3.2. The distribution function Gx has bounded support X . 

Assumption 3.3. The conditional distribution Fx(t) of Y given X is 
twice continuously differentiable in t for all X G X. 
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Assumption 3.4. At any X e X, the conditional density function 
F x (t) = fx(t) > for t in a neighborhood of F x l {Td) for each d = 1, . . . , k. 

Assumption 3.5. E{m(X, Y, ( )m(X,Y, Co) T } is positive definite. 

Assumption 3.1 is to guarantee that the interior of the convex hull of 
{m(Xi, Yi, C) - i = 1, • ■ • , n} for £ G M contains the vector of zeros with prob- 
ability tending to one. By (2.1), Fx(X T fioijd)) = for any d < k and 
X E X. Therefore, for each d, (3o(Td) is a solution to E{mdk+j(X, Y, £)} = 0, 
j = 0,...,p. Under Assumption 3.4, /3o(r^) is indeed the unique solution. 
Correspondingly, £o is the unique solution for E{m(X, Y, £)} =0. 

Theorem 3.1. Under Assumptions 3.1-3.5, the MELE £ is a consistent 
estimator of Co ■ 

The proof of Theorem 3.1 is sketched in the Appendix. The basic idea is 
to check the conditions for consistency appearing in Theorem 5.7 of van der 
Vaart (1998). Because those conditions require some uniform convergence 
properties for collections of functions involving m(X, Y, £), we use the em- 
pirical process theory as a natural tool. 

3.2. Asymptotic normality of the posterior. To validate the asymptotic 
normality of the posterior distribution (2.4), we make one more assumption. 

Assumption 3.6. log{po(C)} nas bounded first derivative in a neigh- 
borhood of £o- 

Then we have the following theorem. 

Theorem 3.2. Under Assumptions 3.1-3.6, the posterior density of Q 
has the following expansion on any sequence of sets {C : C — Co = 0(n -1 / 2 )}: 

(3.1) p((\D) a exp{-i(C - () T J n (( - C) + R n }, 

where £ is the MELE, 

J n = nV^V u 1 V m 
Vu = V®E(XX T ), 

dE{m(X,Y,C)} 

Vl2 = a? 

d t C=Co 

1/2 

and R n = o p (l) . When J n is positive definite, we have J n (£ — £) converging 
in distribution to N(0,I). 

There are clear similarities between Theorem 3.2 here and Theorem 1 
of Lazar (2003) for smooth estimating equations. We have considered fixed 
priors, a common scenario in the literature, where the limiting posterior 
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distributions of £ are the same as the limiting sampling distribution of the 
MELE [cf. Qin and Lawless (1994)]. An important remark follows. 

Remark 3.1. The results in Theorem 3.2 apply to the cases where the 
dimension of £ is smaller than the dimension of the estimating functions 
m(X, Y,£). For £ with a reduced dimensionality, the definition of V12 is 
taken to be the derivative with respect to the reduced parameter vector. 

Asymptotically Theorem 3.2 justifies the use of the BEL approach 
for quantile regression with respect to frequentist properties. When 
/x(A T /3o(r^)) = f Td is constant for all X, which is true for homoscedastic 
error models, we can simplify V\i to 

Via = — diag(/ Td )rf = i v .. j fc ® E(XX T ), 

if C is of k(p + 1) dimensions. Because V\\ = <8> £(XX T ), the resultant 
asymptotic variance of the posterior quantity, J" 1 , is equivalent to the 
asymptotic variance of the usual quantile regression (RQ) estimates, as pro- 
posed in Koenker and Bassett (1978). This property is not shared by all 
working likelihoods. If £ is of lower dimensions, the posterior variance no 
longer takes the same form, and improvements in the asymptotic variances 
over RQ become possible. 

Remark 3.2. An improper prior cannot guarantee a proper posterior 
distribution. In fact, the posterior will be improper for flat priors on £ in 
the BEL approach, and therefore we should avoid using flat priors on (. 

Next, we consider a more interesting scenario where the prior distribution 
shrinks with n. In this case, we use Po,n(C) as priors, and make the following 
assumption. 

Assumption 3.7. The logarithm of the prior density Po,n(C) is twice 
continuously differentiable, with the prior mode £o,n = 0(1), and the matrix 

Jo,n = - ^ 0g{ ^ n(0} \c=Co, n = 0(n). 

By Assumption 3.7, log{po,n(C)} can be Taylor expanded up to the quad- 
ratic term as follows. 

log{p ,n(C)} = log{pOn(Co,n)} 

(3.2) 

- |(C - Co,n) T Jo,n(C - Co.n) + o(\\C - C0,n|| 2 ). 

Then we have the following result. 

Theorem 3.3. Under Assumptions 3.1-3.5 and 3.7, the posterior den- 
sity of £ has the following expansion on any sequence of sets {C : IK — Coll = 
0{n~ 1 / 2 )}: 

(3.3) p((\D) a exp{-i(C - 6 post ) T J n (( - 6 post ) + R n }, 
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where 

"post — 

and R n = o p {\). 

Compared to Theorem 3.2, the additional term Jq u in both J n and # p0 st 
in Theorem 3.3 provides a balanced view of when and how an informative 
prior can complement the likelihood in large samples. When Jo >n = o p (n), 
the posterior expansion in Theorem 3.3 is the same as that of Theorem 3.2, 
so the empirical likelihood will dominate the prior information. Obviously, 
if Jo, n increases at a faster rate than n, the prior will dominate the empirical 
likelihood. For the more interesting case where Jo, n increases at the rate 
of n, the BEL produces a consistent estimate of Co if ||Co,n — Coll = o p (l); 
otherwise, post may not converge to Co i n probability, that is, a bias may 
be introduced, but the variance is reduced. In the latter case, the posterior 
in (3.3) does not directly lead to asymptotically valid posterior inference. 
However, noting that J n = Jo, n +nV^ VnVis and Jo, n is known, the MCMC 
chain provides an estimate of the matrix nV^VuV^, which is what we need 
to obtain asymptotically valid confidence intervals. 

Shrinking priors are relevant when the informative priors are constructed 
from data of a secondary source or when the hypothesis on common slope 
parameters are not rejected by a statistical test. 

In Theorem 3.3, the prior mode Co,n plays a role in the posterior mean, 
which could be undesirable. For shrinking toward common slopes, we can 
use a class of priors that eliminate the bias due to a mis-specified prior mode 
when the common slope assumption holds. For each d = 1, . . . , k, let g& be 
a spherically symmetric distribution with zero as its center as well as its 
mode, and with a finite second order derivative at zero. We consider a prior 
on C as 

fi- 1/2 G0(Ti)-A>,o)~0i and 

(3.4) 

S; 1/2 (/3(r d ) - /3(n)) |/3(n) ~ g d for d = 2,...,k 

for any location vector f3 Pt o and scatter matrices O and Y,^ of appropriate 
dimensions. They vary with n in our theory, but we have suppressed the 
dependence in notation. If we write 

Y d ,s 

where E^/ and ^d,s represent the components of corresponding to the 
intercept and the slope parameters in /3(r^), respectively, for d = 2, . . . , k, we 
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now assume 

(3.5) Hfi-^HO^), ||£-}||=0(e n ) and ||S d , s || = 0(n~ l ) 
for some sequence e n = o(n). We have the following corollary. 

Corollary 3.4. Suppose that the same conditions of Theorem 3.3 hold. 
If the slope parameters in Co are the same at n,...,T]- ; and a (shrinking) 
prior satisfying (3.4) and (3.5) is used, the posterior mean of Theorem 3.3 
becomes 9 post = Co + O p (e n /n + n~ 1/2 ). 

Clearly, Corollary 3.4 indicates that the center of the posterior is asymp- 
totically unbiased for Co with common slopes regardless of what the prior 
mode /3 Pj o is for fiijd). All we need is to allow the prior variances of the 
slope differences to be in the order of 1/n, but the prior variances of the 
other parameters increasing with n. The idea of constructing such a class 
of shrinking priors applies more broadly than what we have considered here 
with common slopes, but in our empirical work to be reported, only inde- 
pendent normal and t-distributions will be used as gd- 

4. Simulation studies. In this section, we use Monte Carlo simulations 
to investigate the performance of the BEL methods (coverage probability 
and estimation efficiency) from the frequentist viewpoint. We use the fol- 
lowing notation to distinguish BEL estimators with various priors on the 
slope parameters. The usual quantile regression estimation at each r will be 
denoted simply as RQ. 

• BEL.s: BEL estimators of single quantiles using moment restrictions at 
each r. 

• BEL.c: BEL estimators based on joint moment restrictions assuming a com- 
mon slope parameter at several r's. 

• BEL.n: BEL estimators based on joint moment restrictions assuming that 
the differences in slope parameters across r's have normal priors with zero 
mean and "small" variances. 

4.1. Coverage properties. We first take a brief look at the coverage prob- 
abilities of the posterior credible intervals obtained under BEL.s. To see the 
impact of empirical likelihood, we also include in the comparison two other 
Bayesian methods, one based on the true parametric likelihood, and the 
other based on a working likelihood. 

The data are generated from Yi = f3i + Ps(X% — 2) + a (i = 1, . . . , n), where 
the true parameters are /3j = 2, /3s = 1, Xi and ei are independently gener- 
ated from the chi-square distribution with 2 degrees of freedom and iV(0,4), 
respectively. We are interested in estimating the median regression coeffi- 
cients /3/(0.5) and /3s(0.5). Independent priors of N(0, 100 2 ), are used on 
both parameters. We use the 2.5th and the 97.5th percentiles of the Markov 
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chain from BEL.s for r = 0.5 to form 95% interval estimates for the parame- 
ters. The simulation study uses three different sample sizes n = 100, 400, 1600 
to see whether the intervals have desirable coverage probabilities for mod- 
estly large n. 

In addition to BEL.s, we include two other Bayesian methods: 
• BTL: the Bayesian method using the true likelihood 

-i , f b- 0/(0.5) -/Ssf(0.5)(xi- 2) 



i=l 



a 



where <j) is the density of the standard normal distribution. 

BDL: a pseudo Bayesian method using the Laplace density as the working 

likelihood 



n 



a 1 exp 



|j/i-/3j(0.5)-fty(0.5)(xi-2)| 
2a 



i=l 

where a is estimated by the mean of the absolute residuals from the RQ 
estimate at r = 0.5. 

Similar MCMC sampling algorithms are used for all the three methods. 
The BTL method can be viewed as a yardstick for any MCMC based method, 
because it uses the true parametric likelihood under the model, which is 
generally unknown in practice. The reason to consider BDL is that the 
exponential component of its working likelihood is the objective function 
of median regression. The BDL method has been used earlier by Yu and 
Moyeed (2001) among others, but in our empirical work, we have chosen to 
use a fixed value of a in BDL, because we have found that the MCMC chains 
have better mixing properties without including a as an unknown parameter. 
A sensible value of a to use in BDL is the RQ-based scale estimate. Table 1 



Table 1 

Comparison of 95% posterior intervals of the median regression parameters from three 
methods: (1) BEL.s, (2) BTL based on the true likelihood and (3) BDL based on 
a working Laplace likelihood. The coverage probability and lengths of the posterior 
intervals are computed over 1000 data sets of sample sizes n = 100, 400 and 1600 

Coverage of 95% CI Length of 95% CI 

i BEL.s BTL BDL BEL.s BTL BDL 



100 




0.97 


0.94 


0.98 


1.06 


0.80 


1.11 




M0.5) 


0.98 


0.94 


0.98 


0.58 


0.41 


0.58 


400 


MO-5) 


0.97 


0.95 


0.98 


0.43 


0.40 


0.55 




As(0.5) 


0.94 


0.95 


0.98 


0.22 


0.20 


0.28 


1600 


M0.5) 


0.96 


0.96 


0.97 


0.25 


0.21 


0.28 




/3s (0.5) 


0.96 


0.96 


0.98 


0.13 


0.10 


0.14 
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provides the average coverage probability and average length information 
for each of the three methods over 1000 samples at each choice of n. 

This simple simulation study shows that as the sample size increases, the 
posterior intervals obtained from BEL.s and BTL approach the nominal lev- 
els 95%, although the convergence is not as fast as we might have expected. 
Because the underlying model has i.i.d. normal errors, the asymptotic rel- 
ative efficiency of BEL.s and BDL are approximately 67% of BTL, which 
helps explain the differences in the interval lengths. We also note that BEL.s 
outperforms BDL by the frequentist measures, even after we fixed the scale 
parameter in BDL. 

Similar phenomena were observed in the interval estimation for other 
quantiles and under several other error distributions, but we skip the de- 
tails. A more extensive report on estimation efficiency is given in the next 
subsection. 

4.2. Efficiency of BEL under various priors. In this section, we investi- 
gate the estimation efficiency of BEL.s, BEL.c and BEL.n for £ at different 
percentile levels, where the posterior modes are taken as the parameter esti- 
mates. The estimation efficiency is measured by the estimated mean squared 
error (MSE), with data generated from the following four models: 

• Model 1: Y = X + Z + e, where X ~ x 2 (2), Z/2 ~ Bernoulli(0.5) and 
e ~ iV(0,4), with X , Z and e being mutually independent; 

• Model 2: same as Model 1 except that log(e) ~ N(0, 1); 

• Model 3: Y = X + Z + (X/2 + l)e, where X ~ x 2 (2), Z/2 ~ Bernoulli(0.5) 
and e ~ iV(0,4), with X, Z and e being mutually independent; 

• Model 4: same as Model 3 except that log(e) ~ N(0, 1). 

These models include two covariates, of which X is continuous, and Z is 
binary. Models 1 and 2 assume homoscedastic errors, and Models 3 and 4 
allow the error distributions to depend on X. We use b x (r), b z {j) to denote 
the two slope parameters, and consider the adjusted intercept a(r) as the 
fitted value of the rth quantile at the sample mean of (X, 0). The reason 
that we consider this adjusted intercept in the study, instead of the raw 
intercept, is that the fitted value at the average design point of is a more 
meaningful value than the fitted value at the origin, which lies outside of 
the design space. 

The three BEL methods (BEL.s, BEL.c and BEL.n) will be compared 
with RQ and the composite quantile regression (CQR) of Zou and Yuan 
(2008). The CQR assumes common slopes, and minimizes the sum of indi- 
vidual quantile loss functions over several r's of interest. The CQR is a direct 
competitor of BEL.c, because they make the same assumption. 

For Models 1 and 2, the common slope assumption holds, so there is no 
asymptotic bias for any of the methods we consider here. Table 2 shows 
the asymptotic efficiencies of BEL.c and CQR relative to RQ, when several 
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Table 2 

The table presents the ratio of the asymptotic MSE of the RQ estimators over that of the 
BEL.c or CQR estimator for Models 1 and 2, when jointly estimating quantiles at 
t = 0.25, 0.5, 0.75 and r = 0.9,0.925,0.95, respectively 



Asymptotic relative efficiencies for slope estimators 





t = 0.25 


r = 0.5 


r = 0.75 r = 0.9 


r = 0.925 


r = 0.95 








Model 1 






BEL.c/RQ 


1.598 


1.352 


1.598 1.029 


1.219 


1.572 


CQR/RQ 


1.590 


1.345 


1.590 0.984 


1.166 


1.504 








Model 2 






BEL.c/RQ 


1.006 


3.280 


14.942 1.032 


1.677 


3.261 


CQR/RQ 


0.541 


1.763 


8.032 0.756 


1.227 


2.386 



quantiles are estimated jointly. It is clear that BEL.c and CQR are similar 
in efficiency for Model 1, but BEL.c stands out for Model 2. The asymptotic 
efficiency of BEL.s and that of RQ are the same; both of them are improved 
on by the other methods. Table 2 also includes comparisons at joint esti- 
mation of three quartiles, to indicate that the efficiency gain of BEL.c and 
CQR from the comparisons are not limited to high quantiles. 

The asymptotic efficiencies do not depend on the choices of fixed priors. 
We now focus on estimation of high quantiles with r = 0.9, 0.925, 0.95 at the 
sample size of n = 100, with the following priors: 

• For BEL.s and BEL.c, we use the prior iV(0, 100 2 ) for each intercept pa- 
rameter, and N(l, 100 2 ) for each slope parameter. 

• For BEL.n, we use the prior iV(0, 100 2 ) for each intercept parameter, and 
N(l, 100 2 ) for 63,(0.9) and for b z (0.9). The informative priors used to regu- 
late the differences between quantiles are, conditional on /3(0.90), b x (0.925) ~ 
iV(6 x (0.9),0.16), 63,(0.95) ~ JV(6a.(0.9),l), 6^(0.925) ~ iV(6 z (0.9), 0.01) and 
b z (0.95) ~N(b z (0.9), 0.01). 

Additional details of the Bayesian computations can be found in the sup- 
plemental material [Yang and He (2012)]. The MSE's of various estimators 
of /3(r) are given in Table 3 for Models 1 and 2, and in Table 4 for Models 3 
and 4. We make several observations from those results: 

• The performance of BEL.s is similar to or slightly better than that of RQ. 

• When the common slope assumption holds, BEL.c has about the same 
(Model 1) or better (Model 2) efficiency when compared with CQR. The 
estimators that use informative priors on the slope parameters all improve 
on RQ. The differences among various methods are more significant at 
upper quantiles (say r = 0.95) for heavier-tailed distributions. 

• In Models 3 and 4, where the common slope assumption does not hold 
for 63,(7"), BEL.c and CQR show efficiency gains on the estimation of b z {r), 
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Table 3 

The table gives the n x MSE 's of several estimators for the adjusted intercepts and slope 
parameters at three quantile levels t = 0.9, 0.925, 0.95 for Models 1 and 2, where n = 100, 
and the MSE is averaged over 500 samples from each model. The numbers in the 
brackets are the estimated standard errors 



Adjusted intercepts Slopes 
Method a(0.9) a(0.925) a(0.95) 6^(0.9) b z (0.9) b x (0.925) b z (0.925) b x (0.95) b z (0.95) 



Model 1 



BEL.s 


22.0 


26.5 


35.6 


3.0 


11.7 


3.5 


13.8 


4.2 


19.8 




(1.2) 


(1.5) 


(2.1) 


(0.2) 


(0.7) 


(0.2) 


(0.8) 


(0.3) 


(1.2) 


BEL.n 


23.1 


25.7 


31.5 


3.3 


12.3 


3.4 


12.3 


3.9 


12.4 




(1.4) 


(1.6) 


(1.8) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


BEL.c 


26.6 


27.9 


34.1 


3.4 


13.9 


3.4 


13.9 


3.4 


13.9 




(1.6) 


(1.7) 


(2) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


CQR 


22.8 


25.7 


30.0 


3.2 


12.7 


3.2 


12.7 


3.2 


12.7 




(1.4) 


(1.5) 


(1.8) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


(0.2) 


(0.8) 


RQ 


22.3 


26.9 


36.5 


3.3 


12.1 


3.7 


14.4 


4.4 


19.2 




(1.3) 


(1.7) 


(2.2) 


(0.2) 


(0.7) 
Model 


(0.3) 
2 


(0.9) 


(0.3) 


(1.2) 


BEL.s 


76.4 


126.6 


291.3 


9.5 


42.4 


13.5 


71.7 


26.2 


159.2 




(5.6) 


(10.5) 


(32) 


(0.9) 


(3-1) 


(1.1) 


(5.3) 


(2.7) 


(15.4) 


BEL.n 


78.7 


95.0 


150.0 


9.4 


43.6 


10.3 


43.8 


14.5 


43.7 




(5.9) 


(6.1) 


(9.1) 


(0.8) 


(3-3) 


(0.8) 


(3-3) 


(1.1) 


(3-3) 


BEL.c 


86.8 


100.5 


158.3 


9.1 


46.9 


9.1 


46.9 


9.1 


46.9 




(7.5) 


(8.1) 


(10.1) 


(0.8) 


(4.1) 


(0.8) 


(4.1) 


(0.8) 


(4.1) 


CQR 


109.3 


125.5 


175.9 


12.7 


61.7 


12.7 


61.7 


12.7 


61.7 




(11.1) 


(11.3) 


(15.5) 


(1.2) 


(5.2) 


(1.2) 


(5.2) 


(1.2) 


(5.2) 


RQ 


76.4 


136.4 


280.6 


10.0 


41.6 


14.9 


73.4 


26.5 


144.3 




(5.5) 


(14.3) 


(27.8) 


(0.9) 


(3.3) 


(1.4) 


(6.3) 


(2.8) 


(13.6) 



but losses in the estimation of b x (T), due to bias. The BEL.n aims to reach 
a compromise in the bias-variance trade-off, resulting in a better MSE 
than RQ. 

These findings are consistent with what we learned from the asymptotic 
comparisons shown in Table 2. The performance of BEL.n will of course 
depend on the choice of priors on the difference in slopes. The purpose 
of our study is not to demonstrate how to choose informative priors, but 
to show how informative priors can make a difference. Our empirical work 
shows that any reasonable choice of priors helps, even though an optimal 
choice is too much to ask for in general. 

5. An application to temperature downscaling. In recent decades much 
focus has been placed on understanding potential future climate changes. 
Meteorologists have developed various climate models to simulate atmo- 
spheric variables for both historical and future time periods under different 
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Table 4 

Simulation results for Models 3 and 4; see the caption of Table 3 for more details 



Adjusted intercepts Slopes 



Methoda(0.9) a(0.925) a(0.95) b x (0.9) b z (0.9) 6 X (0.925) b z (0.925) b x (0.95) b z (0.95) 













Model 










BJiL.s 


90.2 


103.0 


138.7 


31.0 


35.9 


34.3 


42.6 


42.7 


66.2 




(5.2) 


(5.8) 


(9.0) 


(1.9) 


(2.4) 


(2.0) 


(2.6) 


(2.6) 


(4.7) 


BEL.n 


95.6 


111.0 


129.4 


37.1 


41.4 


44.9 


41.5 


54.8 


41.7 




(5.6) 


(6.1) 


(7.5) 


(2.3) 


(2.8) 


(2.7) 


(2.9) 


(3.3) 


(2.8) 


BEL.c 


104.3 


119.2 


143.0 


37.7 


43.5 


45.3 


43.5 


62.7 


43.5 




(6.9) 


(7.2) 


(8.1) 


(2.3) 


(2.9) 


(2.7) 


(2.9) 


(3.3) 


(2.9) 


CQR 


94.6 


102.8 


118.0 


32.8 


38.4 


33.8 


38.4 


42.5 


38.4 




(5.9) 


(6.3) 


(7.3) 


(2.0) 


(2.6) 


(1.9) 


(2.6) 


(2.4) 


(2.6) 


RQ 


91.4 


106.9 


132.5 


30.6 


33.8 


35.0 


42.4 


42.9 


59.2 




(5.3) 


(6.8) 


(8.3) 


(1.9) 


(2.2) 


(2.0) 


(2-8) 


(2.5) 


(3.9) 












Model 


4 








BEL.s 


334.5 


507.5 


1085.1 


96.5 


134.1 


144.6 


213.5 


252.4 


547.6 




(25.7) 


(40.0) 


(109.5) 


(8.4) 


(io.i) 


(12-3) 


(17.6) 


(19.9) 


(57.8) 


BEL.n 


277.0 


346.8 


518.0 


97.3 


124.7 


125.6 


124.7 


196.9 


125.1 




(22.2) 


(22.2) 


(30.0) 


(5.8) 


(9.2) 


(6.1) 


(9.3) 


(8.6) 


(9.3) 


BEL.c 


391.6 


453.4 


659.8 


111.9 


160.5 


137.2 


160.5 


214.7 


160.5 




(42.2) 


(44.6) 


(62.8) 


(7.6) 


(16.9) 


(7.2) 


(16.9) 


(8.7) 


(16.9) 


CQR 


530.1 


520.0 


663.9 


142.0 


195.3 


140.1 


195.3 


175.0 


195.3 




(56.4) 


(52.1) 


(58.2) 


(12.0) 


(18.8) 


(10.3) 


(18.8) 


(9.5) 


(18.8) 


RQ 


340.3 


552.0 


1014.6 


102.9 


123.9 


154.6 


215.0 


252.7 


481.5 




(25.9) 


(56.6) 


(101.1) 


(8.3) 


(9.6) 


(11.5) 


(19.0) 


(17.8) 


(46.0) 



greenhouse gas emission scenarios. Statistical downscaling approaches utilize 
those large-scale model simulations to predict small-scale regional climate 
changes; see Wilby and Wigley (1997) for a review. Quantifying nearly ex- 
treme events in climate studies is an important task, for which quantile 
regression is a naturally appealing tool. However, high quantiles are usually 
hard to estimate with RQ due to the inherently limited number of observa- 
tions in the tail of the distributions. In this section, we consider the BEL 
methods for statistical downscaling of daily maximum temperature. We used 
the observed daily maximum temperature (TMAX) of Aurora, IL station 
from 1957-2002 as the response variable. The predictors are the simulated 
daily maximum temperature (RTEM) and an indicator of wet days (RAIN) 
from the ERA-40 reanalysis model introduced in Uppala et al. (2005). A wet 
day is denoted by RAIN = 1 , when the precipitation from ERA-40 is more 
than 1.2 kg/s/m 2 . About 30% of the days are categorized as wet days in 
Aurora. We used the following linear quantile regression model: 



(5.1) Q T (TMAX|RTEM, RAIN) = a(r) + 6 x (r)RTEM + 6 2 (r)RAIN 
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Table 5 

Average effective sample sizes of the Markov chains used in the downscaling example 



Method 


SPLIT 1 


SPLIT 2 


SPLIT 3 


BEL.c 


976 


933 


364 


BEL.z 


542 


633 


747 


BEL.t 


672 


701 


515 



at high quantiles r = 0.99,0.995,0.999. The quantile at r = 0.999 is nearly 
extreme relative to our sample size, so the asymptotic theory developed in 
this paper might be questioned. We choose to consider such high quantiles 
partly to test the limits of our BEL methods. 

We applied the following BEL methods with normal priors N{0, 1000 2 ) on 
each parameter to estimate the parameters of Model (5.1), unless otherwise 
specified: 

• BEL.c and BEL.s as introduced in Section 4. 

• BEL.z: the BEL estimator that assumes 6 Z (0.99) = 6^(0.995) = b z (0.999). 

• BEL.t: the BEL estimator that assumes that given 6^(0.99) and 6^(0.99), 
(6 X (0.995) - 6 a .(0.99))/0.02, (6 Z (0.995) - 6,(0.99))/0.14, (^(0.999) - 
6 x (0.99))/0.35 and (6^(0.999) - 6*(0.99))/1.16 are independent priors as 
the t distribution with degrees of freedom 3. 

The scaling used in the prior distributions of BEL.t was chosen in rough 
proportion to the variances of those parameter estimates from RQ, and no 
optimality is claimed here. To assess the performances of various methods, 
we randomly split the data from each year into two parts, a fitting period and 
a testing period, with equal sizes of 7889 days in each part. We used the BEL 
methods and RQ for the fitting period in estimating the model parameters 
and then applied the fitted model to the testing period to predict the rth 
quantile of TMAX. We randomly split the data three times, and labeled 
them as SPLIT 1, SPLIT 2 and SPLIT 3, respectively. The average effective 
sample sizes of the Markov chains for the BEL methods used here are shown 
in Table 5, as calculated by the R function effectiveSizeQ in the R package 
coda. 

Table 6 reports the normalized differences as a performance validation 
measure, 

(5.2) d- 



y/T(l-r)n' 



where n is the total number of days for prediction, O is the number of days 
when the observed TMAX exceeds the predicted rth quantile of TMAX 
and E indicates the expected number of days, that is, E = n(l — r). The 
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Table 6 

The table presents the normalized differences calculated by (5.2). The row names provide 
the method used for model fitting. In the column names, the Whole period indicates all 
the data in the testing period are used; Lower RTEM indicates the testing data with 
RTEM below its median; Wet days indicates the testing data with RAIN equals to 1 



Whole period Lower RTEM Wet days 



Method 


>0.99 


>0.995 


>0.999 


>0.99 


>0.995 


>0.999 


>0.99 


>0.995 


>0.999 














SPLIT 1 












BEL, 


c 


0.012 


0.089 


0.040 


-0.871 


-0.163 


1.036 


0.402 





,142 


-0.316 


BEL, 


,z 


0.012 


0.089 


0.040 


-0.230 


-0.163 


-0.476 


0.804 





,142 


-0.316 


BEL, 


t 


0.012 


0.089 


0.040 


-1.351 


-0.163 


-1.483 


-0.201 





,142 


0.949 


RQ 




-2.930 


-2.306 


-1.742 


-2.151 


-1.743 


-1.987 


-1.005 


-1 


276 


-1.582 














SPLIT 2 












BEL, 


c 


0.012 


0.089 


0.040 


0.250 


0.515 


1.540 


2.659 


1 


,591 


0.329 


BEL, 


z 


0.012 


0.089 


0.040 


1.530 


-0.163 


0.532 


0.642 





,452 


0.329 


BEL, 


t 


0.012 


0.089 


0.040 


1.050 


1.192 


0.532 


1.650 


1 


022 


0.329 


RQ 




0.012 


-0.390 


3.958 


1.370 


0.063 


1.540 


0.843 





,737 


4.139 














SPLIT 3 












BEL 


c 


0.012 


0.089 


0.040 


-1.511 


-0.163 


0.532 


-2.188 


-1 


543 


-0.308 


BEL, 


z 


0.012 


0.089 


0.040 


0.250 


-0.163 


-1.483 


-1.381 


-0 


,974 


0.962 


BEL, 


t 


0.012 


0.089 


0.040 


-0.871 


-0.163 


-0.979 


-0.776 


-0 


,690 


1.596 


RQ 




-0.666 


-0.869 


-2.454 


0.250 


-0.388 


-1.483 


-1.583 


-1 


259 


-1.577 



normalized differences are shown for the whole testing period, as well as for 
two subsets, one subset being the lower half of RTEM, and the other subset 
being the wet days (RAIN = 1). The use of these ad hoc subsets is meant 
to assess performances more comprehensively. The normalized differences 
greater than 2 in absolute values are marked as bold in Table 6, from which 
we have the following observations. First, over the whole testing period, 
the normalized differences of each BEL method are stable across random 
splits, but those from RQ predictions vary noticeably. For the testing peri- 
ods and for the selected subsets, the BEL methods perform better than RQ, 
especially at r = 0.999. Second, among the BEL methods, BEL.c performs 
relatively worse, but BEL.t and BEL.z do well. When we used the ANOVA 
test of Koenker and Bassett (1982) for the null hypothesis of common slopes 
at r = 0.99,0.995,0.999, the hypothesis of & x (0.99) = ^(0.995) = 6 X (0.999) 
was rejected at 5% level of significance. This helps explain the inferior per- 
formance of BEL.c relative to the other BEL methods, but all of them out- 
perform RQ. 

Our empirical study shows that BEL methods can easily improve on RQ as 
downscaling methods for high quantiles. Informative priors will help further 
if the "prior makers" are well informed. In climate studies, for example, 
historical data are generally available from multiple stations nearby, which 
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can lead us to helpful informative priors on slope parameters in the quantile 
models. In this sense, the shrinking priors considered in Theorem 3.3 are 
relevant. 

A natural question in climate downscaling is the autocorrelation of mea- 
surements over time. In this section we have bypassed this issue on two 
grounds. First, the quantile regression estimation under the working as- 
sumption of independence is typically consistent under weakly dependent 
models; see He, Zhu and Fung (2002). Second, we verified empirically that 
the autocorrelation in TMAX was well represented by the autocorrelations 
in the predictors used in Model (5.1), and the signs of the residuals of the 
quantile models were nearly uncorrelated. In more general applications, how- 
ever, it will be desirable to incorporate dependence in an appropriate way, 
and future research is clearly called for in this regard. Another interesting 
area of future work is to perform downscaling at a group of stations and 
include spatial correlation in the model. A recent paper by Reich, Fuentes 
and Dunson (2011) made a successful attempt at Bayesian spatial quan- 
tile regression, and the idea of BEL with informative priors can be further 
explored in spatial modeling. 

6. Discussion. In this paper, we propose using empirical likelihood as 
a working likelihood for quantile regression in Bayesian inference. We jus- 
tify the validity of the posterior based inference by establishing its first 
order asymptotics. The BEL approach avoids the daunting task of directly 
maximizing the EL function and allows informative priors to be utilized. 
Although the idea of Bayesian quantile regression is not new, the work pro- 
vides an important addition to the literature by providing the basic theory 
for incorporating possibly informative priors on multiple quantiles. The ef- 
ficiency gains are demonstrated through both theoretical calculations and 
empirical investigations, when some common features across quantiles are 
explored. If common slopes are assumed, it is hard for the CQR method to 
find optimal weights in balancing the quantile loss function at different r 
levels, but the empirical likelihood approach does so naturally. The use of 
informative priors is also related in spirit to penalized optimization, but the 
lack of a good overall objective function for several quantile levels makes the 
usual regularization method difficult to formulate. The EL approach has the 
ability to adapt automatically across quantile levels, and the BEL approach 
enables flexible priors to be utilized in a simple way. Our theoretical frame- 
work of shrinking priors provides good understanding of how informative 
priors and likelihood can complement each other in the BEL approach. 

This paper uses empirical likelihood, but some of its variants such as the 
ETEL, may work as well. The recent work of Lancaster and Jun (2010) pro- 
vided an approximation to the posterior from the Bayesian ETEL of quantile 
regression at a given r. Although their approximation was not strong enough 
to imply posterior convergence for the Bayesian ETEL, it can be strength- 
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ened using the approach we provide for BEL. We hope that comparisons in 
a broader class of working likelihoods together with efficient algorithms will 
be further developed in the future. 

APPENDIX: PROOFS 

We begin with lemmas about the smoothness properties of functions 
involving the estimating functions (2.2). Note that the estimating func- 
tions (2.2) involve an indicator function, and as a result, the results ob- 
tained in Qin and Lawless (1994) for smooth functions do not apply. While 
the work of Qin and Lawless (1994) relies on the Taylor expansions, our 
proof uses the general theorem related to M-estimators in van der Vaart 
(1998) and the quadratic expansion approximating the EL function pro- 
vided in Molanes Lopez, Van Keilegom and Veraverbeke (2009). We use Xj 
to indicate the jth component in the covarariates vector X for j = 0, . . . ,p, 
that is, X = (xo, x±, ■ ■ ■ , x p ) with xq = 1. 

A.l. Preparatory results. We discuss the properties of functions involv- 
ing the estimating function m(X,Y,£). Under Assumptions 3.2 and 3.3 
about Gx and Fx, E{m(X, Y, £)} can be sufficiently smooth. 

Lemma A.l. Under Assumptions 3.2 and 3.3, we have the following 
results: 

(LI) E{m(X, Y, £)} and E{m(X, Y, Qrn(X, Y, C) T } are twice continuously 
differentiable with respect to (. 

(L2) There exist k(p + 1) dimensional compact neighborhoods Cg and Q 
around 0, in which E[m(X, Y, C)/{1 + S, T m(X, Y, £)}] is twice continuously 
differentiable in ( G C c and££C x , and E[m(X,Y,()m(X,Y,() T /{l + ( Y m(X, 
Y, £)}] is uniformly continuous with respect to £ € and £ G C\. 

Proof. To show (LI), note that for each d = 0, . . . , k— 1 and j = 0, . . . ,p, 
there is 

E{m dk+j (X,Y,p(T))} = E{(l {Y < X T^ Td+1 )} ~ T d+1 )Xj} 

= E X [Xj{Ey\x(l{Y<X T l3(T d+1 )} - T d+l)}} 

= Ex[x J {Fx(X T (3(r d+1 )) - r d+1 }]. 

Under Assumptions 3.2 and 3.3, E{m(X, Y, £)} is twice continuously differ- 
entiable. Consider the cases i < I for the second moments. By the definition 
of regression quantiles, A T /3(rj) < A T /3(r/), and therefore, 

E{m ik+ j(X, Y, ()mt k + m (X, Y, ()} 

= Ex[xjX m {E Y \x(l{Y<X T /3(T i+1 )} ~ T i+l)( 1 {K<X T /3(r ;+1 )} ~ T l+l)}\ 
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= E x [x j x m {F x (X T p(r i+1 )) - t 1+1 F x {X t p(T i+l )) 
- t 1+1 F x (X t I3(t 1+1 )) +r i+1 T l+1 }], 

which is twice continuously differentiable in £. 
Similarly, (L2) follows from 

m dk+j (X,Y,C) 

l-T d+1 ) Xj , v To, ^ 77 t V T 



Ex 



£ \'^ {Fx(X^{r s+l ))-F x {X^{r s ))} 

0<s<d ? s 



£ T^7fe{^(X T /3(r s+1 ))-F x (X T /3(r s ))} 



d<s<fc 

where we assume to = 0, Tfc+i = 1, raj = ((1 — T\)X T , . . . , (1 — Tk)X T ) T and 
m* s = (-nX T , -t s X t , (1 - r s+1 )X T , . . . , (1 - r fc )X T ) T for s = l,...,k. 
Because m* is bounded, 1 + £ T m* could be bounded away from for £ in 
a sufficiently small compact neighborhood C^. Then E[m ( ik+j{X,Y,C t k)/{^ + 
£ T m(X, Y, £)}] is also twice continuously differentiable in £ and £. Similarly, 
we have E[m(X, Y, Qm(X, Y, £) T /{1 + £ T m(X, Y, £)}] is uniformly continu- 
ous with respect to £ S Q and £ € Ca- □ 

A.2. Consistency of the MELE. By Assumptions 3.2-3.4, the equation 
E^mpT, Y, £)} = has the unique solution (q. Define 

n 

(A.l) r n (C) = -n" 1 ^log{l + X n (0 T m(X u Yi,C)}, 



where A n (£) satisfies 



_ + A n (C) T m(x 4 ,y 4 ,c) 



0. 



Recall that 

C = argmax{r n (C)}, 
we define the expected value of T n (() as 
(A.2) T(C) = -E[log{l +aO T m(X,Y,C)}], 

where satisfies 

f m(x,y,Q ) 
\i + e(C) T m(x,y,c)J ' 

By Lemma A.l, Assumption 3.5, and the implicit function theorem, 
uniquely exists in the neighborhood C>, of € To show that £ is 
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a consistent estimator of Co> it is sufficient to check the conditions of Theo- 
rem 5.7 of van der Vaart (1998). That is, we shall check 

(A.3) su P |r„(c)-r(c)|4o 

c 

and 

(A.4) sup r(c)<r(Co) 

IIC-Co||>e 

for any £ within the compact neighborhood Cq of Co an d e > 0. 

Lemma A. 2. Under Assumptions 3.1-3.5, (A.4) holds. 

Proof. It is easy to see £(Co) = because E{m(X, Y, Co)} = 0, and then 
r(Co) = 0. By the Taylor expansion, we have 

r(0 _ - m r E l rnjX.YQ | _ If M^.y.O)' 



i + {(C) T m(x,y;c)J 2 l(i + Q (c) T m(x,r,c)) 2 

for some a(C) ° n the line segment between and £(£)■ On the right-hand 
side of the above equation, the first term equals 0, and the second term with 
the negative sign included is strictly negative, and thus r(C) < for C 7^ Co- 
So within the compact neighborhood of Co> we have 

sup r(0<r(Co). 

IIC-Cb||>e □ 
To check (A.3), we first expand T n (C) — T(C) as 

(A.5) r n (c)-r(c) = Qi + Q 2 , 

where 

Ql = -n~ 1 [log{l + A n (C) T m(^,^,C)}] 

l<i<n 

+ E[\og{l + \ n (() T m(X i ,Y i ,()}}, 
Q 2 = -£[log{l + X n (C) T miX^YiX)}} + £[log{l +t(Q T m(X i ,Y i ,0}]. 
To show the uniform convergence of (A.5), we need the following lemma. 

Lemma A.3. (i) The class of constant functions: Cq = {A, A € C} is 
P-Glivenko-Cantelli (P-G-C) class, where C is some compact set in R. 
(ii) For bounded X , the class of functions 

m(X,Y,Q 
l + \T m (X,Y,() 



Fi = { - , , T -'/J- ^ an :C€^c A " £ ^A ? and 



T 2 = {log({l + C T m(A, Y, C)} : C € C f , £ € C x } 

are P-G-C, where C\ is a compact neighborhood around € R fc ( p+1 ), and 
is a compact neighborhood around Co £ 
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Proof, (i) According to Theorem 8.14 of Kosorok (2008) and the fact 
that Co is a collection of bounded functions, we only need to show that Co is 
VC-class, as defined in Section 9.1.1 in Kosorok (2008). The P-measurability 
will be guaranteed by the measurability and boundedness of the constant 
functions in Cq. The collection of all subgraphs of functions in Co is So = 
{(x, y),y < A}. For any two points (xi,yi), (x2 5 2/2) in K 2 , assume y\ < y2, it is 
impossible that So would include (£2,2/2) while excluding (xi,yi). Therefore, 
based on the definition of VC-subgraph Class, we have VC(Co) = 2 < 00, i.e., 
Co is a VC class, (ii) From Lemma 9.12 and Lemma 9.8 of Kosorok (2008), 
we know that the class of indicator functions Qo = {l{y<x T /3}>/^ £ K p+1 } is 
a VC-class. From (vi) and (vii) in Lemma 9.9 of Kosorok (2008), the sets of 
estimating functions 



1 < d < k, are VC-class. Because X is bounded, Gd is P-G-C class by The- 
orem 8.14 of Kosorok (2008). Then by Theorem 9.26 of Kosorok (2008), it 
follows that J 7 ! and T2 are P-G-C. □ 

We now verify (A. 3). We will check the uniform convergence of Q\ and Q2 
in (A. 5). Because J~2, in which £ is not related to (X, Y), is P-G-C, the 
uniform convergence implied by P-G-C guarantees the convergence of Q\. 
For Q2, because log{l + £(C) T m(Aj, Y{, ())} is bounded, by the dominate 
convergence theorem, we only need to show A n (£) — >• uniformly in £. 
Because X n (C) is actually a Z-estimator, the approximate zero of a data- 
dependent function of as defined in Chapter 5.1 in van der Vaart (1998), 
then by using the standard arguments of Z-estimator in van der Vaart (1998) 
and by the fact that T\ is P-G-C, we have X n {C) £(C) uniformly in (. 

The proof of Theorem 3.1 is now complete. 

A. 3. Asymptotic normality of the posterior. In our notation, we have 



where 1Z n (C) is the empirical likelihood ratio of £. To expand r n (£) up to the 
quadratic term, we use Assumption 3.5. We also use the following lemma, 
which is taken from the quadratic expansion provided in Lemma A. 6 of 
Molanes Lopez, Van Keilegom and Veraverbeke (2009) but formulated to 
suit our setting. 

Lemma A. 4. Assume that the results of Lemma A.l and Theorem 3.1 
hold. Under Assumptions 3.1-3.5, and additional conditions (C1)-(C3) listed 
below, we have 



r„(C) = -UC ~ Co) T v 1 T 2 v 1 i 1 v 1 2(C - Co) + ™~ 1/2 (C - (ofv^v^Mn 

-In-iMjv^Mn + Opin- 1 ) 



Gd ~ {(1{Y<XT /3( Td )} 



Ti)x 3 ^{T d )eMP +1 ^<j<p} 



(A.6) 



log{ft„(C)} = nr n ,(C) 
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uniformly in (, for C — Co = 0{n~ 1 / 2 ), and 

(A.8) C - Co = n-^ 2 (V^ y-iy 12 )-iyTy-l Mn + o p {n~ 1 / 2 ), 

where C is the MELE of Co, M n = n" 1 / 2 Yh=\ m(X u Y h Co) and V\\ and V 12 
are the same as defined in Theorem 3.2. 

(CI) ||Ej=iM*i,*U) - E{m{Xi,YiX)}]\\ =O p (n 1 / 2 ), uniformly in C 
in a o(l)-neighborhood of Co- 

(C2) || E2=i KXi, y is C)m(Xi, 3* C) T - £{m(X*, y, C)mpQ, Y; , C) T }] || = 
o p (n), uniformly in C in a o(l)-neighborhood of Co- 

(C3) \\Etil m (Xi,YiX) ~ E{m(X u Y u ()} - m(X,Y,Co) + E{m(X,Y, 
Co)}] II = Opin 1 / 2 ), uniformly in ( for C - Co = O p {n~ 1 / 2 ). 

To use the expansion (A. 7), we shall verify that (C1)-(C3) are satisfied. 

Lemma A. 5. Under Assumptions 3.2-3.4, Conditions (C1)-(C3) are 
satisfied for the estimating functions m(X,YX) of (2.2). 

Proof. Because the collection of estimating functions m(X,YX) is 
P-Donsker class, we have (CI). By the fact that the collection of the product 
of the estimating functions is P-G-C, we have (C2). By applying Lemma 4.1 
of He and Shao (1996) to m(X,Y,Q, we obtain (C3). □ 

Proof of Theorem 3.2. By Lemma A. 5, Lemma A. 4, (A. 7) and (A. 6), 
we have 

p((\D)= Po (()xn n (() 

= p (C) x exp j-^(C - Co) T ^iI^7 V 12 (C - Co) 

+ ™ 1/2 (C - Co) T V^V^Mn - ^Mjv^Mn + 0p (l)j. 
Because of (A.8), we have 

P((\D) =p (C) x exp j-^(C - CoVV^VniC - Co) 

+ n(C-Co) T ^nV 12 (C-Co) 

-^MlV^Mn + Opil) 
= po(C) x exp j-^(C - Co) T V 1 ^ 1 T 1 Vi 2 (C - 2C + Co) 
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= Po(C) x exp|-^(C - (Vv^V^VuiC - + Op(l) J. 

By Assumption 3.6, we have 

log{p (C)} = log{po(Co)} + 0(n- 1 /2 ) 

for C-Co = 0("- _1/2 ). Then we have 

(A.9) p(C|Z>) = Po(Co) exp{-i(C - () T Jn(( ~ C) + Op(l)h 

where J n = nV^V^Vu- For any n, we have p((\D) oc p{(\D), and thus (3.1) 
holds. 

Because J n is positive definite, we have 
(A.10) p(jV 2 (C ~ 0\D) oc exp{-|(jV 2 (C - C)) T (4 /2 (C ~ 0) + o p (l)} 
for any C — Co = O^?-" 1 / 2 ). Therefore, to show 

jV 2 (C-C)^iV(o,i), 

it remains to show that 

P(||jy 2 (C-C)ll>*)^o, 

— 1/2 

when 5 — >■ oo and n — >• oo. Prom (A.9), we have for any £ = £ + J n £, 
^n(C) xpo(C) A Po (Co)exp{-||t|| 2 /2}. 

Because of 7Z n (() x po(C) <Po(C)> by the dominate convergence theorem, 
we have 

/ Po(C + 4~ 1/2 ^n(C + 4" 1/2 O^^Po(Co) / exp{-||t|| 2 /2}dt 
J||i||>5 "Ml*||><5 

for any S > 0. Then it leads to 

il| t ||> 5 P0(C + Jn 1/2 t)K n (C + Jn 1/2 t) dt 



P(||4 /2 (C-C)II>^) 



J M>0 P0(( + Jn 1/2 t)TZ n (C + J n V2 t) dt 

ij |f||> ,exp{-||t|| 2 /2}dt 
Jj |t||>0 exp{-||t|| 2 /2}^ 

(27r) -fc(p+D/2 f exp{-\\t\\ 2 /2}dt 

J\\t\\>6 



< e 

for sufficiently large 5. □ 
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Proof of Theorem 3.3. Similar to the proof of Theorem 3.2, we have 



n 



(A.ll) p((\D) =p ,n(C) x exp j--(C - C) ' V^V^VuiC - () + o p (l) 
By Assumption 3.7, we have 

log{po,n(C)} = log{po,n(Co,n)} ~ |(C ~ Co,n) T Jo,n (C ~ Co,n) + p (l) 

for IIC — ^o|| = 0(ra -1 / 2 ) and bounded Co,n- Combined with (A.ll), we have 

p((\D) = C n exp{-i(C - ^post) T Jn(C - #post) + Rn}, 

where J n = J ,„ + nV^V^Vn, 6> post = ^ 1 (Jb,nCo,n + ^ll^n^C)) #n = 
Op(l), and C n is some constant that does not depend on £, and has the 
following expression: 



n i 



Cn =PO,n(Co,n)exp|--C 0irl Jo,nCo,n - "C ^12^11 + ^post^post j- 

Therefore, we have (3.3). □ 

Proof of Corollary 3.4. The prior density po,n(C) can be written 

as 

logp ,n(C) = C + log^iCn-V 2 ^^) - &,o))} 

+ ^log{ 5(i (S- 1/2 (/3(T d )-/3(r 1 )))}, 

d=2 

where C is some constant not depending on Q. Clearly, the prior mode is 
P( T d) = Pp,o for all d = 1, . . . , k. Then we have 



a 2 logp ,n(C) 



«/3 2 (T!) 



C=l fe ®/3p,o 



and for d = 2, . . . , k, 



a 2 logp ,n(C) 



aP{T\)aP(Td) 



a 2 logp ,n(C) 



n- 1/2 g?(Q)"- 1/2 i y 1/2 gg(Q)Sd 1/2 
ffi(O) ^ » d (o) 

^log^^S" 1 / 2 ^^)-/?^)))} 



C=lfe®/3p,o 



aP(n)aP(Td) 



C=lfe®/3 P ,o 



-1/2 // 



sS(o)s; 



-1/2 



a/3 2 (r d ) 



<te(0) 



C=lfe®/3 P ,o 



a(3 2 (T d ) 

^ 1/2 gg(Q)si 1/2 



C=lfe(gi/3p,o 
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Note that for a spherically symmetric g<i with its mode and center as zero, 
we have 



9d(0) 



C d I, 



where I is the (p+ 1) x (p+ 1) dimensional identity matrix, and > are 
constants for d = 1, . . . , k. Then, we have 

( k \ 

d=2 
-C2S2 



Jo,n 



c 2 s 2 1 







and therefore, 

Jo,n{(o,n — Co) 

/ 



CiSl-HPpfl ~ A)(n)) + £ C d £^j(/Vfa) - A),/(n)) 
C 2 E^(A),j(ti)-A)^(75)) 



where fioj(Td) is the intercept parameter in ^o( T d)- Under the assumption 
in (3.4) and (3.5), ||Jo,n(Co,n ~~ Co) II = 0(e n ) and ||Jo,n|| is increasing at the 
rate of n. Then Assumption 3.7 is satisfied, and Theorem 3.3 applies. 
Note that the posterior mean pos t m Theorem 3.3 can be written as 

#post = Co + nJ~ 1 V 12 V 1 - i L Vvi{C ~ Co) — Jo,n{t,0,n ~ Co)- 

By (A. 8), we have ||C — Coll = O p (n -1 / 2 ). Then we have the posterior mean 

# P ost = Co + 0. p {e n /n + n- 1 ' 2 ). □ 
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SUPPLEMENTARY MATERIAL 

Supplement to "Bayesian empirical likelihood for quantile regression" 

(DOI: 10.1214/12-AOS1005SUPP; .pdf). The supplementary material con- 
tains additional details on the implementation of the Bayesian computations 
used in the empirical studies reported in this paper. 
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